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Abstract — Recently, flash memories have become a competitive 
solution for mass storage. The flash memories have rather dif- 
ferent properties compared with the rotary hard drives. That is, 
the writing of flash memories is constrained, and flash memories 
can endure only limited numbers of erases. Therefore, the design 
goals for the flash memory systems are quite different from these 
for other memory systems. In this paper, we consider the problem 
of coding efficiency. We define the "coding-efficiency" as the 
amount of information that one flash memory cell can be used 
to record per cost. Because each flash memory cell can endure a 
roughly fixed number of erases, the cost of data recording can be 
well-defined. We define "payload" as the amount of information 
that one flash memory cell can represent at a particular moment. 
By using information-theoretic arguments, we prove a coding 
theorem for achievable coding rates. We prove an upper and 
lower bound for coding efficiency. We show in this paper that 
there exists a fundamental trade-off between "payload" and 
"coding efficiency". The results in this paper may provide useful 
insights on the design of future flash memory systems. 

I. Introduction 

Recently, flash memories have become a competitive solu- 
tion for mass storage. Compared with the conventional rotary 
hard drives, flash memories have high random access read 
speed, because there is no mechanical seek time. Flash mem- 
ory storage devices are also more lightweight, power efficient, 
and kinetic shock resistant. Therefore, they are becoming 
desirable choices for many applications ranging from high- 
speed servers in data centers to portable devices. 

Flash memories are one type of solid state memories. Each 
piece of flash memory usually contains multiple arrays of 
flash memory cells. Each memory cell is a transistor with 
a floating gate. Information is recorded using one memory 
cell by injecting and removing electrons into and from the 
floating gate. The process of injecting electrons is called 
programming and the process of removing electrons is called 
erase. Programming increases the threshold voltage level of the 
memory cell, while erase decreases the threshold voltage level. 
The threshold voltage level of the memory cell is the voltage 
level at the control gate that the transistor becomes conducting. 
In the reading process for the memory cell, the threshold 
voltage level is detected, thus the recorded information can 
be recovered. 

The memory cells are organized into pages and then into 
blocks. The programming is page-wise and erase is block- 
wise. Usually, one memory block is first erased, so that all 
memory cells within the block return to an initial threshold 
voltage level. After the erase operation, the pages in the 
block are programmed (possibly multiple times), until normal 



threshold voltage level ranges are used up. Then, the memory 
block is erased again for further use. 

One challenge for flash memories is that the number of erase 
operations that one memory cell can withstand is quite limited. 
For current commercial flash memories, such maximal num- 
bers of block erase operations range from 5,000 to 100,000. 
After such a limited number of erase operations, the flash 
memory cell would become broken or unreliable. Therefore, 
data encoding methods must be carefully designed to address 
such an issue. 

In fact, flash memories can be considered as one type of 
write-once-memories. The write-once-memories were first dis- 
cussed in the seminal work by Rivest and Shamir [1]. Previous 
examples of write-once-memories include digital optical disks, 
punched paper tapes, punched cords, and programmable read- 
only memories etc. Rivest and Shamir show that by using 
advanced data encoding methods, the write-once-memories 
can be rewritten. In [1], one theorem for the achievable 
data recording rates of binary write-once-memories has been 
proven using combinatorial arguments. During the passed 
research, many data encoding methods for rewriting the write- 
once-memories have been proposed, see for example, [2] [3] 
etc. 

In this paper, we consider a coding efficiency problem 
for data encoding on flash memories. Unlike other type of 
computer memories, the cost of data encoding can be well- 
defined for flash memories. That is, the cost for each erase 
operation can be defined based on the cost of the flash memory 
block and the total number of erase operations that the memory 
block can have. The coding efficiency problem is therefore 
the problem of recording more information using fewer erase 
operations. To our best knowledge, such a design problem for 
flash memories has never been discussed before. 

We assume that one flash memory block has N cells, and 
each cell can take K voltage levels. We assume that the data 
encoding scheme uses the memory block for T rounds between 
two consecutive erase operations. That is, in the first round, a 
message A/[l] is recorded using the block, and in the second 
round, a new message M [2] is recorded, and so on. Suppose 
that Nl t bits are recorded during the t-th round. We define the 
payload p and coding efficiency c as 

t=l t=l 
where, a is a constant depending on the type of the memory 
block, e.g., NOR type, NAND type, single-level-cell, multi- 
level-cell etc. The constant a may be used to reflect the cost 



for the flash memory block. It should be clear that the coding 
efficiency measures the amount of recording information per 
voltage level cost. We may also define the voltage level cost 
per recorded bit, which is exactly 1/c. 

In this paper, we first prove a coding theorem for achievable 
rates of data encoding on flash memories using information- 
theoretic arguments. Using the coding theorem in this paper, 
we prove an upper bound for the optimal coding efficiency. 
We also show a lower bound of optimal coding efficiency 
using a specific coding scheme. Surprisingly, we find that there 
exists a tradeoff between the optimal coding efficiency and 
payload. These results may provide useful insights and tools 
for designing future flash memory storage systems. 

The rest of this paper is organized as follows. In Section HIl 
we present the coding theorem for achievable coding rates. In 
Section [TTT1 we show the upper bound of the optimal coding 
efficiency. In Section [IVj we present the lower bound for 
optimal coding efficiency using a specific coding scheme. The 
coding efficiency to payload tradeoff is discussed in Section 
[Vl Some concluding remarks are presented at Section [Vl] 

II. Coding Theorem 

We consider a memory block with N memory cells. Each 
memory cell can take K threshold voltage levels, that is, each 
memory cell can be at one of the states 0, 1, . . . , K — 1. After 
one erase operation, all memory cells are at the state K — 1. 
During each programming process, the state of each cell can 
be decreased but never increased. Assume that the memory 
block can be reliably used for T rounds of information record- 
ing, where messages M (1), M(2), ...,M (t), ...,M (T) are 
recorded. We define the corresponding data rate in the t- 
th round l(t) = log 2 (\M(t)\)/N, where \M(t)\ denote the 
alphabet size of the message M(t). In this case, we say that 
the sequence of data rates l{t), t — 1, . . . , T is achievable. We 
assume that all the T messages are statistical independent. 
We denote the state of the n-th cell in the block during 
time t by X n (t). We use the notation X^ (t) to denote the 
sequence X±(t), Xzft), . . . , Xjf(t). Similarly, Xi(t) denotes 
the sequence Xi(t), X2(t), . . . ,X n (t), where 1 < n < N. 
We use H(-) to denote the entropy and conditional entropy 
functions as in [4]. 

Theorem 2.1: A sequence of data rates l(t), t = 1, . . . ,T 
is achievable, if and only if, there exist random variables 
U(l), . . . ,U(T) jointly distributed with a probability distri- 
bution P (77(1), . . . , U{T)), such that, 



F(U(t)=j\U(t-l) 
l(t) <H(U{t)\U(t- 
1(1) <H ([/(!)). 



= i) = 0, if j > i, for t 
1)), fori = 2,...,T, 



2,.. 



(2) 



By convention, U(0) = K — 1 with probability 1. 

Proof: The achievable part is proven by random binning. 
For the <-th round of data recording, we construct a random 
code by throwing typical sequences of U(t) into exp {Nl(t)} 
bins uniformly in random. The message m(t) is encoded by 
finding a sequence X^(t) in the m(t)-th bin, such that the 



sequence X^ (t) is jointly typical with X^ (t — 1). If such 
a sequence can not be found, then one encoding error is 
declared. 

Suppose that l(t) < H(U(t)\U(t - 1)) - 2e, where e is 
an arbitrarily small positive number. Then, the probability of 
encoding error can be upper bounded as follows. 



(error) 
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< exp 

(b) 

< exp 

< exp ( 



exp(Nl(t)) 



exp(Nl(t)) 

exp(N(H(U(t)\U(t-l))-e)) 
cxp{N (H(U(t)\U(t- 1)) - 2t)} / 
exp(eiV)) (3) 



where, N\ denotes the number of typical sequences X^(t) 
that are jointly typical with X^(t — 1), (a) follows from the 
inequality, (1 — x) < exp(— x), for < x < 1, (b) follows 
from the fact that iVi > exp {N (H(U(t)\U(t — 1)) — e)}. 
The achievable part of the proof then follows from the fact 
that e can be taken arbitrarily small. 

We prove the converse part by constructing some random 
variables U(l), . . . , U(T), which satisfy the conditions in the 
theorem. Assume that there exists at least one coding scheme, 
which satisfies the conditions in the theorem. 

In the first step, we wish to show 

H(M{t))<H(X?{t)\X?(t-l)) (4) 
This is because, on the one hand, 

H(M(t),X?(t)\X»(t-l)) 

= H (X»{t)\X?{t -l))+H (M(t)\X»(t),X?(t 1)) 

^H(X»(t)\X?(t-l)) (5) 

where, (a) follows from the fact that M(t) can be completely 
determined by observing X± (t). On the other hand, 

H{M(t),X»(t)\X»(t-l)) 

= H (M(t)\X?(t -l))+H (X»(t)\M(t), X»(t 1)) 



(a) 



= ; H (M(t)) + H (Xf (t)|M(t),Xf(i - 1)) 



(6) 



where, (a) follows from the fact that M (t) is independent of 
X»(t-1). 

In the second step, we can show that 



N 



(7) 



H(M{t)) <Y,H{X n {t)\X n (t-l)) 

n=l 

This is because, 

N 

H {X?(t)\X?tt -1))=J2 H {X n (t)\xr\t),X?{t I)) 



n=l 
N 

<^H(X n (t)\X n (t-l)) (8) 

n=l 



where the last inequality follows from the fact that conditions 
do not increase entropy. 

Let us define random variables Z, U(l), 17(2), , . . . , U(T) 
as follows. The random variable Z takes values in 
{1,2,..., iV} uniformly in random. 



U(t)=X n {t), if Z = n 



(9) 



The probability distribution of the random variables 
Z, 17(1), U{2), U(T) can be factored as follows. 

T 

¥(Z) H P(U(t)\U(l), ...,U(t- l),Z) (10) 

It can be checked that 

P (?/(*) = j|Z7(t - 1) = i) = 0, if j>i (11) 
Finally, we wish to show that 

Nl{t) = H (M{t)) < NH (U(t)\U{t - 1)) (12) 
This is because 

N 



H(M(t)) <J2 H (X n (t)\X n (t - 1)) 

(a) 



n=l 

NH(U(t)\U(t-l),Z) 



< NH (U(t)\U(t — 1)) 



(13) 



where, (a) follows from the definition of Z, (b) follows from 
the fact that conditions do not increase entropy. 

Therefore, we have constructed the random variables 
U(l), . . . , U{T), which satisfy the conditions in the theorem. 
The theorem is proven. 

■ 

III. Upper Bound 

In this section, we prove an upper bound for the achievable 
coding efficiency. It is clear that the coding efficiency can 
be calculated based on the Theorem 12.11 by forming an 
optimization problem. Let us define a random variable V(t) = 
U(t - 1) - U(t) with an alphabet {0, 1, . . . , K - 1}. With a 
given payload p, the optimization problem is as follows. 



mm 

F(V(l),...,V(i),...V(T)) 



(14) 



Subject to: ^ H(V(t)\U(t - 1)) > Tp (15) 

t 

p(j2v(t)>i?J =0 (16) 

By convention, U(0) = K — 1 with probability 1. It should 
be clear that the coding efficiency 

aTp 



c < 



£ t E(F(t)*) 



(17) 



where V(t)* denotes the minimizer of the optimization prob- 
lem. 



However, the above optimization problem is difficult to 
solve in closed-form. We will consider instead a relaxed 
optimization problem. First, we remove the constraint in Eqn 
[TBI Second, we relax the constraint J2t H(V(i)\U(t - 1)) > 

Tp to J2tH(y(t)) - T P' due t0 the fact that conditions do 
not increase entropy. Thus, the original optimization problem 
becomes 



P(V(1), 1 " i i ^ — ' 

Subject to: ^F(V(t)) > Tp 



min E 

..,V(t),...V(T)) 



(18) 



In a final step, because all the constraint and objective 
functions only depend on marginal distributions of V(t), we 
may further relax the above optimization problem by replacing 
the joint distribution 

F(V(l),...,V(t),...,V(T)) (19) 

with a set of pseudo marginal distributions 

¥(V(l)),...,F(V(t)),...,¥(V(T)) (20) 

The pseudo marginal distributions may or may not correspond 
to a joint distribution. The final relax optimization problem is 
thus as follows. 



min E 

r(v(i)),...,p(v(t)),...,p(v(r))) 

Subject to: ^i7(V(i)) > Tp 



En*) 



(21) 



Using the Lagrangian method, we can find that the optimal 
distribution for V(t) takes the following form 



nv(t)=j) 



exp(-/?tj) 
exp(-fts) 



(22) 



for a certain parameter f3 t > 0. Let us define the cost function 
cost(/3 t ) and rate function rate(/3 t ) at the t-th data encoding 
round as follows. 

cost(/3f) = E [V(t)} , rate(/3 t ) = H{V{t)) (23) 

where V(t) has a probability distribution in Eqn. [22] Both the 
two functions have closed-form formula, 



cost(/3 t ) = 



Sjlo 1 jexp(-/3 t j) 
E^Lo 1 exp(-^s) 



rate(ft) = ftcost^f 



Theorem 3.1: The coding efficiency c is upper bounded by 

aE t rate((/3 t )) 



c < 



Etcos^/ 3 * 



(25) 



where, f3 t corresponds to the solution to the relaxed optimiza- 
tion problem in Eqn. [2T| 



Proof: The optimal value of a relaxed maximization 
optimization problem is greater than or equal to the optimal 
value of the original optimization problem. ■ 
In our further discussion, we need to define a stage coding 
efficiency function 



/(/?) = 



rate(/3) 
cost(/3) 



(26) 



d(rate(/3)) 
d(cost(/3)) 



(27) 



Lemma 3.2: 

Proof: 

d rate(/3) 
d cost(/3) 

_ cogtQg) + /3cost'{/3) + £ fc ~fc exp(-fffc)/ J2 S exp(-fis) 

cost'(/3) 
cost(/3) + /3cost'(/3) - cost(/3) 



cost'(/3) 



(28) 



where, the derivatives at the right hand sides are with respect 
to 0. ■ 

Lemma 3.3: The function cost(/3) is a decreasing function 
with respect to 0. 

Proof: In order to show that cost(/3) is a decreasing func- 
tion, it is sufficient to show that log(cost(/3)) is a decreasing 
function. The derivative of log(cost(/3)) is 



Ef^p 1 fcexp(-fc/3) _ ELq 1 fc2 exp(-fc/3) 
J2k=o exp(-fc^) EfcLV fc exp(-fc/3) 
By using the Cuachy-Schwarz inequality, we have 

2 



(29) 



fcexp(— k/3) 



k=0 



K-l 



K-l 



exp(— fc/3) 
(30) 



fc=0 



fc=0 



and the equality holds only when (3 goes to infinity. It thus 
follows that the derivative of log(cost(/3)) is strictly negative 
for any finite (3. The lemma follows. ■ 

Lemma 3.4: The function f((3) is an increasing function 
with respect to (3. 

Proof: The derivative of f{0) is as in Eqn. |5T| 

The lemma is proven if we can show that 



Y.k=o exp(-fc/3) J2k=o fc2 exp(-A^) 



Efc= lfcex p(- fc #> 



> 1 



(32) 



That is, 

K-l 



^ fcexp(— k(3) 



k=0 



< 



K-l 



J2 e M~kf3) 



k=0 



K-l 



fc 2 exp(-fc/3) 
(33) 



fc=0 



We can show that this is indeed the case by using the Cuachy- 
Schwarz inequality, 




VxkUk 



(34) 



Theorem 3.5: In the solution to the optimization problem 
in Eqn. [2T| 

Pi= & = ... = & = ..- = Pt = P- (35) 
Therefore, the coding efficiency 

c<^« (36) 

COSt(/3) 

Proof: The theorem is proven by contradiction. Suppose 
that in the optimization solution for Eqn. [21] there exist f3 s and 
0t such that fi a > fit- According to Lemma |331 cost(/3 s ) < 
cost(/3 t ). We may modifity S and (3 t slightly into (3 S — A0 S 
and t + A0 t , such that 

cost(/3 s - A0 S ) = cost(/3 s ) + Acost (37) 

cost(/3 t + A0 t ) = cost(ft) ~ Acost (38) 

where Acost > 0. Therefore, the total sum of cost functions 
remains the same. On the other hand, the rate function 
corresponding to (3 S increases with derivative (3 S , and the rate 
function corresponding to f3 t decreases with derivative j3 t . The 
total sum of rate functions increases. Therefore, f3 s and f3 t can 
not be a part of the optimization solution. This results in a 
contradiction. The theorem is proven. ■ 

IV. Achievable Lower Bound using Random 
Coding Arguments 

In this section, we prove a lower bound for the coding 
efficiency by using a specific random coding scheme. The 
data encoding scheme consists of multiple stages. During all 
the stages, the cells in the block are restricted to take one of 
two states, fc or fc — 1, where fc = 1, . . . , K — 1. Assume in 
a certain stage, there are I cells that take the state fc — 1, and 
the rest N — I cells take state fc. Then, during this stage, the 
state of only one memory cell is changed from fc to fc — 1 
and l(t) = log 2 [(l — c)(N — l)\ bits can be recorded, where 
[•J denotes the floor function, and e is a small real number, 
< e < 1. 

The data encoding process is as follows. Let us throw 
all the sequences of symbols with length N and alphabet 
{0, 1, 2, . . . , K — 1} into 2"w) bins uniformly in random. If 
the to-be-recorded message is m[t], then we check the m[t\- 
th bin. We try to find one sequence in the bin, such that the 
current configuration of the memory cells can be modified to 
be equal to the sequence by turning the state of one memory 
cell X n from fc to fc — 1. If such a sequence can be found, then 
we turn the state of the memory cell X n from fc to fc — 1. If 
such a sequence can not be found in the bin, then a decoding 
error is declared and we randomly turn one memory cell from 
fc to fc — 1 and go to the next coding stage. 



/'(/?) = log 



'K-\ 

exp(-fc/3) 

\k=0 



Y.k=o exp(-fc/3) Efe=o ifc2ex P( - ^) 



J]f =0 1 fcexp(-A:/3) 



(31) 



We assume that the data decoding process knows the 
random coding schemes, for example, by sharing the same 
random source with the encoder, or using a pseudo random 
source. In the first step of data decoding, the decoder can 
determine the stage of data encoding by looking at the states 
of the memory cells and the number I of cells being at the 
state k—1. The recorded message m(t) can then be recovered 
by looking at the bin index of the current configuration of the 
memory cells. 

The encoding error probability can be bounded as follows, 

-i N-l 



P (error) < 

(a) / 
< exp I — 



1 



1 



L(i 

N - 



-e)(N- 
I 



L(l-e)(JV-0J 



OJ 

< exp 



-1 
1 - < 



(39) 



< 



where, (a) follows from the inequality, (1 — x) 
exp(— xy), for x € (0,1), y > 0. 

The expected total amount of recoded information between 
two erase operations can be bounded as 



E(rate) > (K - 1) 



N 

£ 

1=0 



1 — exp 



1 -e 



log 2 (L(l-e)(iV-Z)J) (40) 



For sufficiently large N and e = 0.5, the total expected 
recorded information is lower bounded as 



E(rate) > 



(K - 1)N 



[l-exp(-2)]log(/V/2) (41) 



Therefore, the coding efficiency is bounded as follows. 

c >|[l-exp(-2)]log(iV/2) 
The payload can be calculated as 
1 



P 



N-l 



e){N-l)\) 



(42) 



(43) 



2=0 



Based on the above discussions in this section, we arrive at 
the following theorem. 

Theorem 4.1: The optimal coding efficiency for K level N 
cell flash memories can go to infinity as N goes to infinity. 

V. The Coding-Efficiency-to-Payload Tradeoff 

Some important insights can be gained from the upper and 
lower bounds for coding efficiency proved in the previous 
sections. From the upper bound, it can be seen clearly that 
the coding efficiency decreases as the payload increases. From 
the lower bound, it can be seen that the coding efficiency may 
go to infinity as the payload decreases to zero. Therefore, we 
can conclude that there exists a tradeoff between the coding 



efficiency and payload. The tradeoff is illustrated in Fig. Q] In 
the figure, the upper and lower bound for coding efficiency 
are shown, where the x-axis shows the payload. We assume 
a = 1, and the flash memories are 8-level (3bit) TLC type 
flash memories. 
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Fig. 1. Upper and lower bounds for coding efficiency of 3-bit flash memory 
cells. 



VI. Conclusion 

In this paper, we study the coding efficiency problem for 
flash memories. A coding theorem for achievable rates is 
proven. We prove an upper and lower bounds for the coding 
efficiency. We show that there exists a tradeoff between the 
coding efficiency and payload. Our discussions in this paper 
provide useful insights on the design of future flash memory 
systems. 
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Abstract — Recently, flash memories have become a competitive 
solution for mass storage. The flash memories have rather dif- 
ferent properties compared with the rotary hard drives. That is, 
the writing of flash memories is constrained, and flash memories 
can endure only limited numbers of erases. Therefore, the design 
goals for the flash memory systems are quite different from these 
for other memory systems. In this paper, we consider the problem 
of coding efficiency. We define the "coding-efficiency" as the 
amount of information that one flash memory cell can be used 
to record per cost. Because each flash memory cell can endure a 
roughly fixed number of erases, the cost of data recording can be 
well-defined. We define "payload" as the amount of information 
that one flash memory cell can represent at a particular moment. 
By using information-theoretic arguments, we prove a coding 
theorem for achievable coding rates. We prove an upper and 
lower bound for coding efficiency. We show in this paper that 
there exists a fundamental trade-off between "payload" and 
"coding efficiency". The results in this paper may provide useful 
insights on the design of future flash memory systems. 

I. Introduction 

Recently, flash memories have become a competitive solu- 
tion for mass storage. Compared with the conventional rotary 
hard drives, flash memories have high random access read 
speed, because there is no mechanical seek time. Flash mem- 
ory storage devices are also more lightweight, power efficient, 
and kinetic shock resistant. Therefore, they are becoming 
desirable choices for many applications ranging from high- 
speed servers in data centers to portable devices. 

Flash memories are one type of solid state memories. Each 
piece of flash memory usually contains multiple arrays of 
flash memory cells. Each memory cell is a transistor with 
a floating gate. Information is recorded using one memory 
cell by injecting and removing electrons into and from the 
floating gate. The process of injecting electrons is called 
programming and the process of removing electrons is called 
erase. Programming increases the threshold voltage level of the 
memory cell, while erase decreases the threshold voltage level. 
The threshold voltage level of the memory cell is the voltage 
level at the control gate that the transistor becomes conducting. 
In the reading process for the memory cell, the threshold 
voltage level is detected, thus the recorded information can 
be recovered. 

The memory cells are organized into pages and then into 
blocks. The programming is page-wise and erase is block- 
wise. Usually, one memory block is first erased, so that all 
memory cells within the block return to an initial threshold 
voltage level. After the erase operation, the pages in the 
block are programmed (possibly multiple times), until normal 



threshold voltage level ranges are used up. Then, the memory 
block is erased again for further use. 

One challenge for flash memories is that the number of erase 
operations that one memory cell can withstand is quite limited. 
For current commercial flash memories, such maximal num- 
bers of block erase operations range from 5,000 to 100,000. 
After such a limited number of erase operations, the flash 
memory cell would become broken or unreliable. Therefore, 
data encoding methods must be carefully designed to address 
such an issue. 

In fact, flash memories can be considered as one type of 
write-once-memories. The write-once-memories were first dis- 
cussed in the seminal work by Rivest and Shamir [?]. Previous 
examples of write-once-memories include digital optical disks, 
punched paper tapes, punched cords, and programmable read- 
only memories etc. Rivest and Shamir show that by using 
advanced data encoding methods, the write-once-memories 
can be rewritten. In [?], one theorem for the achievable 
data recording rates of binary write-once-memories has been 
proven using combinatorial arguments. During the passed 
research, many data encoding methods for rewriting the write- 
once-memories have been proposed, see for example, [?] [?] 
etc. 

In this paper, we consider a coding efficiency problem 
for data encoding on flash memories. Unlike other type of 
computer memories, the cost of data encoding can be well- 
defined for flash memories. That is, the cost for each erase 
operation can be defined based on the cost of the flash memory 
block and the total number of erase operations that the memory 
block can have. The coding efficiency problem is therefore 
the problem of recording more information using fewer erase 
operations. To our best knowledge, such a design problem for 
flash memories has never been discussed before. 

We assume that one flash memory block has N cells, and 
each cell can take K voltage levels. We assume that the data 
encoding scheme uses the memory block for T rounds between 
two consecutive erase operations. That is, in the first round, a 
message A/[l] is recorded using the block, and in the second 
round, a new message M [2] is recorded, and so on. Suppose 
that Nl t bits are recorded during the <-th round. We define the 
payload p and coding efficiency c as 

t=l t=l 
where, a is a constant depending on the type of the memory 
block, e.g., NOR type, NAND type, single-level-cell, multi- 
level-cell etc. The constant a may be used to reflect the cost 



for the flash memory block. It should be clear that the coding 
efficiency measures the amount of recording information per 
voltage level cost. We may also define the voltage level cost 
per recorded bit, which is exactly 1/c. 

In this paper, we first prove a coding theorem for achievable 
rates of data encoding on flash memories using information- 
theoretic arguments. Using the coding theorem in this paper, 
we prove an upper bound for the optimal coding efficiency. 
We also show a lower bound of optimal coding efficiency 
using a specific coding scheme. Surprisingly, we find that there 
exists a tradeoff between the optimal coding efficiency and 
payload. These results may provide useful insights and tools 
for designing future flash memory storage systems. 

The rest of this paper is organized as follows. In Section HIl 
we present the coding theorem for achievable coding rates. In 
Section [TTT1 we show the upper bound of the optimal coding 
efficiency. In Section [IVj we present the lower bound for 
optimal coding efficiency using a specific coding scheme. The 
coding efficiency to payload tradeoff is discussed in Section 
[Vl Some concluding remarks are presented at Section [Vl] 

II. Coding Theorem 

We consider a memory block with N memory cells. Each 
memory cell can take K threshold voltage levels, that is, each 
memory cell can be at one of the states 0, 1, . . . , K — 1. After 
one erase operation, all memory cells are at the state K — 1. 
During each programming process, the state of each cell can 
be decreased but never increased. Assume that the memory 
block can be reliably used for T rounds of information record- 
ing, where messages M (1), M(2), ...,M (t), ...,M (T) are 
recorded. We define the corresponding data rate in the t- 
th round l(t) = log 2 (\M(t)\)/N, where \M(t)\ denote the 
alphabet size of the message M(t). In this case, we say that 
the sequence of data rates l{t), t — 1, . . . , T is achievable. We 
assume that all the T messages are statistical independent. 
We denote the state of the n-th cell in the block during 
time t by X n (t). We use the notation X^ (t) to denote the 
sequence X±(t), Xzft), . . . , Xjf(t). Similarly, Xi(t) denotes 
the sequence Xi(t), X2(t), . . . ,X n (t), where 1 < n < N. 
We use H(-) to denote the entropy and conditional entropy 
functions as in [?]. 

Theorem 2.1: A sequence of data rates l(t), t = 1, . . . ,T 
is achievable, if and only if, there exist random variables 
U(l), . . . ,U(T) jointly distributed with a probability distri- 
bution P (77(1), . . . , U{T)), such that, 



V(U(t)=j\U(t-l) 
l(t) <H(U(t)\U(t- 
1(1) <H ([/(!)). 



= i) = 0, if j > i, for t 
1)), fori = 2,...,T, 



2,.. 



(2) 



By convention, U(0) = K — 1 with probability 1. 

Proof: The achievable part is proven by random binning. 
For the <-th round of data recording, we construct a random 
code by throwing typical sequences of U(t) into exp {Nl(t)} 
bins uniformly in random. The message m(t) is encoded by 
finding a sequence X^(t) in the m(t)-th bin, such that the 



sequence X^ (t) is jointly typical with X^ (t — 1). If such 
a sequence can not be found, then one encoding error is 
declared. 

Suppose that l(t) < H{U(t)\U(t- 1)) - 2e, where e is 
an arbitrarily small positive number. Then, the probability of 
encoding error can be upper bounded as follows. 



(error) 



1 



1 



(a) 

< exp 

(b) 

< exp 

< exp ( 



exp(Nl(t)) 



exp(Nl(t)) 

exp(N(H(U(t)\U(t-l))-e)) 
cxp{N (H(U(t)\U(t- 1)) - 2t)} / 
exp(eiV)) (3) 



where, N\ denotes the number of typical sequences X^(t) 
that are jointly typical with X^(t — 1), (a) follows from the 
inequality, (1 — x) < exp(— x), for < x < 1, (b) follows 
from the fact that iVi > exp {N (H(U(t)\U(t — 1)) — e)}. 
The achievable part of the proof then follows from the fact 
that e can be taken arbitrarily small. 

We prove the converse part by constructing some random 
variables U(l), . . . , U(T), which satisfy the conditions in the 
theorem. Assume that there exists at least one coding scheme, 
which satisfies the conditions in the theorem. 

In the first step, we wish to show 

H(M{t))<H(X?{t)\X?(t-l)) (4) 
This is because, on the one hand, 

H(M(t),X?(t)\X»(t-l)) 

= H (X»{t)\X?{t -l))+H (M(t)\X»(t),X?(t 1)) 

^H(X»(t)\X?(t-l)) (5) 

where, (a) follows from the fact that M(t) can be completely 
determined by observing X± (t). On the other hand, 

H{M(t),X»(t)\X»(t-l)) 

= H (M(t)\X?(t -l))+H (X»(t)\M(t), X»(t 1)) 



(a) 



= ; H (M(t)) + H (Xf (t)|M(t),Xf(i - 1)) 



(6) 



where, (a) follows from the fact that M (t) is independent of 
X»(t-1). 

In the second step, we can show that 



N 



(7) 



H(M{t)) <Y,H{X n {t)\X n (t-l)) 

n=l 

This is because, 

N 

H {X?(t)\X?tt -1))=J2 H {X n (t)\xr\t),X?{t I)) 



n=l 
N 

<^H(X n (t)\X n (t-l)) (8) 

n=l 



where the last inequality follows from the fact that conditions 
do not increase entropy. 

Let us define random variables Z, U(l), 17(2), , . . . , U(T) 
as follows. The random variable Z takes values in 
{1,2,..., iV} uniformly in random. 



U(t)=X n {t), if Z = n 



(9) 



The probability distribution of the random variables 
Z, 17(1), U{2), U(T) can be factored as follows. 

T 

¥(Z) H P(U(t)\U(l), ...,U(t- l),Z) (10) 

It can be checked that 

P (?/(*) = j|Z7(t - 1) = i) = 0, if j>i (11) 
Finally, we wish to show that 

Nl{t) = H (M{t)) < NH (U(t)\U{t - 1)) (12) 
This is because 

N 



H(M(t)) <J2 H (X n (t)\X n (t - 1)) 

(a) 



n=l 

NH(U(t)\U(t-l),Z) 



< NH (U(t)\U(t — 1)) 



(13) 



where, (a) follows from the definition of Z, (b) follows from 
the fact that conditions do not increase entropy. 

Therefore, we have constructed the random variables 
U(l), . . . , U{T), which satisfy the conditions in the theorem. 
The theorem is proven. 

■ 

III. Upper Bound 

In this section, we prove an upper bound for the achievable 
coding efficiency. It is clear that the coding efficiency can 
be calculated based on the Theorem 12.11 by forming an 
optimization problem. Let us define a random variable V(t) = 
U(t - 1) - U(t) with an alphabet {0, 1, . . . , K - 1}. With a 
given payload p, the optimization problem is as follows. 



mm 

F(V(l),...,V(i),...V(T)) 



(14) 



Subject to: ^ H(V(t)\U(t - 1)) > Tp (15) 

t 

p(j2v(t)>i?J =0 (16) 

By convention, U(0) = K — 1 with probability 1. It should 
be clear that the coding efficiency 

aTp 



c < 



£ t E(F(t)*) 



(17) 



where V(t)* denotes the minimizer of the optimization prob- 
lem. 



However, the above optimization problem is difficult to 
solve in closed-form. We will consider instead a relaxed 
optimization problem. First, we remove the constraint in Eqn 
[TBI Second, we relax the constraint J2t H(V(i)\U(t - 1)) > 

Tp to J2tH(y(t)) - T P' due t0 the fact that conditions do 
not increase entropy. Thus, the original optimization problem 
becomes 



P(V(1), 1 " i i ^ — ' 

Subject to: ^F(V(t)) > Tp 



min E 

..,V(t),...V(T)) 



(18) 



In a final step, because all the constraint and objective 
functions only depend on marginal distributions of V(t), we 
may further relax the above optimization problem by replacing 
the joint distribution 

F(V(l),...,V(t),...,V(T)) (19) 

with a set of pseudo marginal distributions 

¥(V(l)),...,F(V(t)),...,¥(V(T)) (20) 

The pseudo marginal distributions may or may not correspond 
to a joint distribution. The final relax optimization problem is 
thus as follows. 



min E 

r(v(i)),...,p(v(t)),...,p(v(r))) 

Subject to: ^i7(V(i)) > Tp 



En*) 



(21) 



Using the Lagrangian method, we can find that the optimal 
distribution for V(t) takes the following form 



nv(t)=j) 



exp(-/?tj) 
exp(-fts) 



(22) 



for a certain parameter f3 t > 0. Let us define the cost function 
cost(/3 t ) and rate function rate(/3 t ) at the t-th data encoding 
round as follows. 

cost(/3f) = E [V(t)} , rate(/3 t ) = H{V{t)) (23) 

where V(t) has a probability distribution in Eqn. [22] Both the 
two functions have closed-form formula, 



cost(/3 t ) = 



Sjlo 1 jexp(-/3 t j) 
E^Lo 1 exp(-^s) 



rate(ft) = ftcost^f 



Theorem 3.1: The coding efficiency c is upper bounded by 

aE t rate((/3 t )) 



c < 



Etcos^/ 3 * 



(25) 



where, f3 t corresponds to the solution to the relaxed optimiza- 
tion problem in Eqn. [2T| 



Proof: The optimal value of a relaxed maximization 
optimization problem is greater than or equal to the optimal 
value of the original optimization problem. ■ 
In our further discussion, we need to define a stage coding 
efficiency function 



/(/?) = 



rate(/3) 
cost(/3) 



(26) 



d(rate(/3)) 
d(cost(/3)) 



(27) 



Lemma 3.2: 

Proof: 

d rate(/3) 
d cost(/3) 

_ cogtQg) + /3cost'{/3) + £ fc ~fc exp(-fffc)/ J2 S exp(-fis) 

cost'(/3) 
cost(/3) + /3cost'(/3) - cost(/3) 



cost'(/3) 



(28) 



where, the derivatives at the right hand sides are with respect 
to 0. ■ 

Lemma 3.3: The function cost(/3) is a decreasing function 
with respect to 0. 

Proof: In order to show that cost(/3) is a decreasing func- 
tion, it is sufficient to show that log(cost(/3)) is a decreasing 
function. The derivative of log(cost(/3)) is 



Ef^p 1 fcexp(-fc/3) _ ELq 1 fc2 exp(-fc/3) 
J2k=o exp(-fc^) EfcLV fc exp(-fc/3) 
By using the Cuachy-Schwarz inequality, we have 

2 



(29) 



fcexp(— k/3) 



k=0 



K-l 



K-l 



exp(— fc/3) 
(30) 



fc=0 



fc=0 



and the equality holds only when (3 goes to infinity. It thus 
follows that the derivative of log(cost(/3)) is strictly negative 
for any finite (3. The lemma follows. ■ 

Lemma 3.4: The function f((3) is an increasing function 
with respect to (3. 

Proof: The derivative of f{0) is as in Eqn. |5T| 

The lemma is proven if we can show that 



Y.k=o exp(-fc/3) J2k=o fc2 exp(-A^) 



Efc= lfcex p(- fc #> 



> 1 



(32) 



That is, 

K-l 



^ fcexp(— k(3) 



k=0 



< 



K-l 



J2 e M~kf3) 



k=0 



K-l 



fc 2 exp(-fc/3) 
(33) 



fc=0 



We can show that this is indeed the case by using the Cuachy- 
Schwarz inequality, 




VxkUk 



(34) 



Theorem 3.5: In the solution to the optimization problem 
in Eqn. [2T| 

Pi= & = ... = & = ..- = Pt = P- (35) 
Therefore, the coding efficiency 

c<^« (36) 

COSt(/3) 

Proof: The theorem is proven by contradiction. Suppose 
that in the optimization solution for Eqn. [21] there exist f3 s and 
0t such that fi a > fit- According to Lemma |331 cost(/3 s ) < 
cost(/3 t ). We may modifity S and (3 t slightly into (3 S — A0 S 
and t + A0 t , such that 

cost(/3 s - A0 S ) = cost(/3 s ) + Acost (37) 

cost(/3 t + A0 t ) = cost(ft) ~ Acost (38) 

where Acost > 0. Therefore, the total sum of cost functions 
remains the same. On the other hand, the rate function 
corresponding to (3 S increases with derivative (3 S , and the rate 
function corresponding to f3 t decreases with derivative j3 t . The 
total sum of rate functions increases. Therefore, f3 s and f3 t can 
not be a part of the optimization solution. This results in a 
contradiction. The theorem is proven. ■ 

IV. Achievable Lower Bound using Random 
Coding Arguments 

In this section, we prove a lower bound for the coding 
efficiency by using a specific random coding scheme. The 
data encoding scheme consists of multiple stages. During all 
the stages, the cells in the block are restricted to take one of 
two states, fc or fc — 1, where fc = 1, . . . , K — 1. Assume in 
a certain stage, there are I cells that take the state fc — 1, and 
the rest N — I cells take state fc. Then, during this stage, the 
state of only one memory cell is changed from fc to fc — 1 
and l(t) = log 2 [(l — c)(N — l)\ bits can be recorded, where 
[•J denotes the floor function, and e is a small real number, 
< e < 1. 

The data encoding process is as follows. Let us throw 
all the sequences of symbols with length N and alphabet 
{0, 1, 2, . . . , K — 1} into 2"w) bins uniformly in random. If 
the to-be-recorded message is m[t], then we check the m[t\- 
th bin. We try to find one sequence in the bin, such that the 
current configuration of the memory cells can be modified to 
be equal to the sequence by turning the state of one memory 
cell X n from fc to fc — 1. If such a sequence can be found, then 
we turn the state of the memory cell X n from fc to fc — 1. If 
such a sequence can not be found in the bin, then a decoding 
error is declared and we randomly turn one memory cell from 
fc to fc — 1 and go to the next coding stage. 



/'(/?) = log 



'K-\ 

exp(-fc/3) 

\k=0 



Y.k=o exp(-fc/3) Efe=o ifc2ex P( - ^) 



J]f =0 1 fcexp(-A:/3) 



(31) 



We assume that the data decoding process knows the 
random coding schemes, for example, by sharing the same 
random source with the encoder, or using a pseudo random 
source. In the first step of data decoding, the decoder can 
determine the stage of data encoding by looking at the states 
of the memory cells and the number I of cells being at the 
state k—1. The recorded message m(t) can then be recovered 
by looking at the bin index of the current configuration of the 
memory cells. 

The encoding error probability can be bounded as follows, 

-i N-l 



P (error) < 

(a) / 
< exp I — 



1 



1 



L(i 

N - 



-e)(N- 
I 



L(l-e)(JV-0J 



OJ 

< exp 



-1 
1 - < 



(39) 



< 



where, (a) follows from the inequality, (1 — x) 
exp(— xy), for x € (0,1), y > 0. 

The expected total amount of recoded information between 
two erase operations can be bounded as 



E(rate) > (K - 1) 



N 

£ 

1=0 



1 — exp 



1 -e 



log 2 (L(l-e)(iV-Z)J) (40) 



For sufficiently large N and e = 0.5, the total expected 
recorded information is lower bounded as 



E(rate) > 



(K - 1)N 



[l-exp(-2)]]og(JV/2) (41) 



Therefore, the coding efficiency is bounded as follows. 

c >|[l-exp(-2)]log(iV/2) 
The payload can be calculated as 
1 



P 



N-l 



e){N-l)\) 



(42) 



(43) 



2=0 



Based on the above discussions in this section, we arrive at 
the following theorem. 

Theorem 4.1: The optimal coding efficiency for K level N 
cell flash memories can go to infinity as N goes to infinity. 

V. The Coding-Efficiency-to-Payload Tradeoff 

Some important insights can be gained from the upper and 
lower bounds for coding efficiency proved in the previous 
sections. From the upper bound, it can be seen clearly that 
the coding efficiency decreases as the payload increases. From 
the lower bound, it can be seen that the coding efficiency may 
go to infinity as the payload decreases to zero. Therefore, we 
can conclude that there exists a tradeoff between the coding 



efficiency and payload. The tradeoff is illustrated in Fig. Q] In 
the figure, the upper and lower bound for coding efficiency 
are shown, where the x-axis shows the payload. We assume 
a = 1, and the flash memories are 8-level (3bit) TLC type 
flash memories. 
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Fig. 1. Upper and lower bounds for coding efficiency of 3-bit flash memory 
cells. 



VI. Conclusion 

In this paper, we study the coding efficiency problem for 
flash memories. A coding theorem for achievable rates is 
proven. We prove an upper and lower bounds for the coding 
efficiency. We show that there exists a tradeoff between the 
coding efficiency and payload. Our discussions in this paper 
provide useful insights on the design of future flash memory 
systems. 



